Higher criticism for discriminating word-frequency tables and authorship attribution
نویسندگان
چکیده
We adapt the Higher Criticism (HC) goodness-of-fit test to measure closeness between word-frequency tables. apply this authorship attribution challenges, where goal is identify author of a document using other documents whose known. The method simple yet performs well without handcrafting and tuning; reporting accuracy at state art level in various current challenges. As an inherent side effect, HC calculation identifies subset discriminating words. In practice, identified words have low variance across belonging corpus homogeneous authorship. conclude that comparing similarity new single author, mostly affected by characteristic relatively unaffected topic structure.
منابع مشابه
Authorship Attribution Using Word Sequences
Authorship attribution is the task of identifying the author of a given text. The main concern of this task is to define an appropriate characterization of documents that captures the writing style of authors. This paper proposes a new method for authorship attribution supported on the idea that a proper identification of authors must consider both stylistic and topic features of texts. This me...
متن کاملAuthorship Attribution Using Word Network Features
In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends to show complex network structure at word level, with low degrees of separation and scale-free (power law) degree distribution. There has also been work on ...
متن کاملMore than Word Frequencies: Authorship Attribution via Natural Frequency Zoned Word Distribution Analysis
With such increasing popularity and availability of digital text data, authorships of digital texts can not be taken for granted due to the ease of copying and parsing. This paper presents a new text style analysis called natural frequency zoned word distribution analysis (NFZ-WDA), and then a basic authorship attribution scheme and an open authorship attribution scheme for digital texts based ...
متن کاملAuthorship Attribution
Authorship attribution, the science of inferring characteristics of the author from the characteristics of documents written by that author, is a problem with a long history and a wide range of application. Recent work in “non-traditional” authorship attribution demonstrates the practicality of automatically analyzing documents based on authorial style, but the state of the art is confusing. An...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: The Annals of Applied Statistics
سال: 2022
ISSN: ['1941-7330', '1932-6157']
DOI: https://doi.org/10.1214/21-aoas1544